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Abstract 

Identifying the most influential individuals can provide invaluable 
help in developing and deploying effective viral marketing strate¬ 
gies. Previous studies mainly focus on designing efficient algo¬ 
rithms or heuristics to find top-A' influential nodes on a given 
static social network. While, as a matter of fact, real-world so¬ 
cial networks keep evolving over time and a recalculation upon 
the changed network inevitably leads to a long running time, sig¬ 
nificantly affecting the efficiency. In this paper, we observe from 
real-world traces that the evolution of social network follows the 
preferential attachment rule and the influential nodes are mainly 
selected from high-degree nodes. Such observations shed light on 
the design of Inclnf, an incremental approach that can efficiently 
locate the top-A' influential individuals in evolving social networks 
based on previous information instead of calculation from scratch. 
In particular, Inclnf quantitatively analyzes the influence spread 
changes of nodes by localizing the impact of topology evolution 
to only local regions, and a pruning strategy is further proposed to 
effectively narrow the search space into nodes experiencing major 
increases or with high degrees. We carried out extensive experi¬ 
ments on real-world dynamic social networks including Facebook, 
NetHEPT, and Flickr. Experimental results demonstrate that, com¬ 
pared with the state-of-the-art static heuristic, Inclnf achieves as 
much as 21 x speedup in execution time while maintaining match¬ 
ing performance in terms of influence spread. 

Categories and Subject Descriptors G.2.2 [Graph Theory ]: Graph 
algorithms and network problems 

General Terms Algorithms, Performance 

Keywords Influence maximization, incremental algorithm, evolv¬ 
ing social network, graph algorithm 

1. Introduction 

Influence maximization (IM) is one fundamental and important 
problem which aims to identify a small set of influential individuals 
so as to develop effective viral marketing strategies in large-scale 
social networks[1 1], As a matter of fact, real-world social networks 
keep evolving over time. For example, in Facebook, new people 
might join while old ones might withdraw, and people might make 
new friends with each other. Moreover, real-world social networks 
are evolving in a rather surprising speed; it is reported that as much 
as 1 million new accounts are created in Twitter every day [30]. 
Such massive evolution of network topology, on the contrary, may 
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lead to a significant transformation of the network structure, thus 
raising a natural need of efficient reidentification. 

Existing researches and solutions on influence maximization fo¬ 
cus mainly on developing effective and efficient algorithms on a 
given static social network. Although one could possibly run any of 
the static influence maximization methods, such as [7, 8, 24, 25], 
to find the new top-A' influential individuals when the network is 
updated, this approach has some inherent drawbacks that cannot be 
neglected: (1) the running time of a specific static method can be 
extremely long and unacceptable especially on large-scale social 
networks, and (2) whenever the network topology is changed, we 
need to recalculate the influence spreads for all the nodes which 
leads to very high costs. Can we quickly and efficiently identify 
the influential nodes in evolving social networks? Can we incre¬ 
mentally update the influential nodes based on previously known 
information instead of frequently recalculating from scratch? 

Unfortunately, the rapidly and unpredictably changing topology 
of a dynamic social network poses several challenges in the reiden¬ 
tification of influential users, which we list as follows. On one hand, 
the interconnections between edges in real-world social graphs are 
rather complicated; as a result, even one small change in topol¬ 
ogy may affect the influence spreads of a large number of nodes, 
not to mention the massive changes in large-scale social networks. 
It is very difficult to efficiently compute the changes of influence 
spreads for all the nodes after the evolution. On the other hand, 
since there are a great number of nodes in large-scale social net¬ 
works, how to effectively limit the range of potential influential 
nodes and reduce the amount of calculation to the maximum is a 
very challenging problem. 

To well address these challenges, we investigate the dynamic 
characteristics exhibited during the evolution of real-world social 
networks. Through tests on three real-world dataset traces, Face- 
book, NetHEPT and Flickr, we observe that, first, the growth of 
social network is mainly based on the preferential attachment prin¬ 
ciple [3], that is the new-coming edges prefer to attach to nodes 
with higher degree, which naturally leads to the “rich-get-richer” 
phenomena; and second, the top-A' influential nodes are mainly 
selected from those high-degree nodes. Inspired by such observa¬ 
tions, we know that the influence changes of some nodes will have 
no impact on the top-A selection, and thus can be pruned to re¬ 
duce the amount of calculation. Motivated by this, we propose Inc¬ 
lnf, an incremental method to identify the top-A' influential nodes 
in evolving social networks instead of recalculating from scratch, 
thus significantly improving the efficiency and scalability to han¬ 
dle extraordinarily large-scale networks. To summarize, the main 
contributions of Inclnf are as follows: 

First, we design an efficient approach to quantitatively analyze 
the influence spread changes from network topology evolution by 


adopting the idea of localization. A tunable parameter is provided 
to tradeoff between efficiency and effectiveness. 

Second, we propose a pruning strategy which could effectively 
narrow the search space into nodes only experiencing major in¬ 
creases or with high degrees based on the changes of influence 
spread and the previous top-A' information. 

Third, we conduct extensive experiments on three dynamic real- 
world social networks. Compared with the state-of-the-art static 
algorithm, Inclnf achieves up to 21 x speedup in execution time 
while providing matching influence spread. Moreover, Inclnf pro¬ 
vides better scalability to scale up to extraordinarily large-scale net¬ 
works. 

The remainder of this paper is organized as follows. Section 

2 presents related preliminaries and problem definition. Section 

3 shows the structural evolution characteristics of dynamic social 
networks that we observe from three datasets: Facebook, NetHEPT 
and Flickr. Section 4 details the design of our incremental algorithm 
Inclnf. The performance of Inclnf is evaluated by comprehensive 
experiments in Section 5. We present related work in Section 6 and 
conclude in Section 7. 

2. Preliminaries and Problem Statement 

In this section, we illustrate the definition of social network and the 
influence diffusion model that we will use throughout the paper, 
and then give the problem definition of influence maximization in 
evolving networks. 

2.1 Preliminaries on Influence Maximization 

Social Network. A social network is formally defined as a di¬ 
rected graph G = (V, A, P) where node set V = {ui, V 2 , ■ ■ ■ , v n } 
denotes entities in the social network. Each node can be either ac¬ 
tive or inactive, and will switch from being inactive to being active 
if it is influenced by others nodes. Edge set E C V x V is a set 
of directed edges representing the relationship between different 
users. Take Twitter as an example. A directed edge ( Vi,Vj ) will be 
established from node n to Vj if Vi is followed by Vj, which in¬ 
dicates that Vj may be influenced by Vi. P denotes the influence 
probability of edges; each edge ( Vi,Vj) £ E is associated with an 
influence probability p(vi, Vj ) defined by function p : E —> [0, 1]. 
If ( Vi,Vj) ^ E, thenp(ui, Vj) = 0. 


Algorithm 1 Basic Greedy 

1: Initialize S = 0 
2: for i = 1 to K do 

3: Select v = arg max„. 6 (y\ S ) (cr(5 U Vi) — <j(S)) 

4: S = SU{v} 

5: end for 


Independent Cascade (IC) Model. IC model is a popular diffu¬ 
sion model that has been well-studied in [7, 17, 19, 25, 32]. Given 
an initial set S, the diffusion process of IC model works as follows. 
At step 0, only nodes in S are active, while other nodes stay in the 
inactive state. At step t, for each node Vi which has just switched 
from being inactive to being active, it has a single chance to activate 
each currently inactive neighbor Vj , and succeeds with a probabil¬ 
ity p(vi, Vj). If Vi succeeds, Vj will become active at step t+1. If Vj 
has multiple newly activated neighbors, their attempts in activating 
Vj are sequenced in an arbitrary order. Such a process runs until no 
more activations are possible [19]. We use o(S) to denote the in¬ 
fluence spread of the initial set S, which is defined as the expected 
number of active nodes at the end of influence propagation. 

Basic Greedy Algorithm. Richardson and Domingos [11,28] first 
introduced the influence maximization problem on static networks 
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Figure 1: Number of Nodes and Edges per month of the Facebook 
dataset. 


in 2001. In [19], Kempe et al. propose a basic hill-climbing greedy 
algorithm as shown in Algorithm 1. The proposed greedy algorithm 
works in K iterations, starting with an empty set S (line 1). In each 
iteration, a node Vi which brings the maximum marginal influence 
spread crs(vi) = a(S U Vi) — <r(S) is selected to be included in 
S (lines 3 and 4). The process ends when the size of S reaches K 
(line 2). However, this algorithm has a serious efficiency drawback 
due to the compute-intensive influence spread calculation. Several 
recent studies [7, 8, 14, 17, 18, 24, 25, 32] aimed at addressing this 
efficiency issue. 

2.2 Formal Definition of IM problem in Evolving Networks 

This paper differentiates itself from previous works by considering 
the dynamic nature of online social networks. As a matter of fact, 
the real-world social networks are not wholly static but keep evolv¬ 
ing gradually over time. The evolution of large social networks has 
raised new sets of questions; among them one interesting yet chal¬ 
lenging problem is how to quickly identify the top-A' influential 
users when the topology of the network is changed. 

To solve such a problem, we define an evolving network £ = 
(G°, G 1 ,--- , G 4 ) as a sequence of network snapshots evolving 
over time, where G 4 = (F 4 , A 4 ,P 4 ) is the network snapshot at 
time t. AG 4 = (AH 4 , A A 4 , A P 4 ) denotes the structural change 
of network graph G 4 . Obviously, we have G 4+1 = G t [jAG t . 
And the influence maximization problem is defined as follows: 

Given: The social network G 4 at time t, the top-A' influential 
nodes S' 4 in G 4 , and the structural evolution AG 4 of graph G 4 . 

Objective: To identify the influential nodes S 4+1 C H 4+1 of size 
K in G t+1 at time t+1, such that the influence spread cr(S' 4+1 ) is 
maximized at the end of influence diffusion. 

3. Observations of Social Network Evolution 

In this section, we study some patterns of social network evolution. 
The number of nodes and edges are firstly investigated in Section 
3.1 to examine the growth of users and interconnections over time. 
Then, we look into the degree distribution of nodes and the pref¬ 
erential attachment rule for new edges in Section 3.2. We further 
examine the relation between the influence and the degree of node 
in Section 3.3. We study three network traces: Facebook, NetHEPT 
and Flickr whose detailed description can be found in Section 5. 
Here we only show the results on Facebook since the evolution 
trends on the other datasets are qualitatively similar and thus omit¬ 
ted. 

3.1 How Fast does the Network Evolve? 

Nodes and edges are the basic elements of the social network 
topology. In this subsection, we use the number of nodes and edges 
to examine the growth of users and interconnections over time. 
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Figure 3: The relation between the influence spread and the degree 
in Facebook. 



Figure 2: Degree distribution and preferential attachment on Face- 
book. 


Figure 1 illustrates the number of nodes and edges over the entire 
trace period on the Facebook dataset; we take a snapshot per month. 
From Figure 1 , we observe a linear increase in the number of nodes 
which indicates a steady number of new users joined the network 
per month. While in terms of edges, the number goes up almost 
exponentially. The number of edges after 14 months is 25.6x of 
that in the initial graph while the number rises to 112.9x after 
28 months. Such rapid growth of nodes and edges raise a natural 
need to efficiently find the most influential nodes after the topology 
evolution. 

3.2 What is the Pattern of Network Topology Evolution? 

Understanding the pattern of the network topology evolution is of 
primary importance to design efficient influence maximization al¬ 
gorithms for evolving social networks. In this subsection, we fur¬ 
ther investigate the degree distribution of nodes and the preferential 
attachment rule [3, 4, 23] for new coming edges. Figure 2a shows 
the degree distribution of the Facebook final graph in log-log scale. 
As expected, it mainly follows the well-known power-law distri¬ 
bution. A large percent of the users have only a small number of 
links with other users, while there exist some “hub" nodes with ex¬ 
tremely large number of connections. This is consistent with the 
real-world networks. 

We also study the preferential attachment rule, or in other 
words, the “rich-get-richer” rule [12, 20], which postulates that 
when a new node joins the network, it creates a number of edges, 
where the destination node of each edge is chosen proportional to 
the destination’s degree. This means that new edges are more likely 


to connect to nodes with high degree than ones with low degree. 
This is reasonable in reality; Lady Gaga gains 30,000 new followers 
on average every day [21] which can never image for any common 
individual. The results on the Facebook dataset are demonstrated 
in Figure 2b where the x axis is the degree of different nodes and 
the y axis is the average number of new edges attached to nodes of 
different degree. Note that both the x and y axis are in log scale. 
From Figure 2b we can see that the degree of users in Facebook 
is linearly correlated with the number of new links created. This 
suggests that high-degree nodes get super-preferential treatment. 
Consequently, the influence spread change should be considerably 
great for the influential nodes, while there may be only small or 
even no change for ordinary people. 

3.3 What is the Relation between Influence and Degree? 

Examining the relation between the influence and the degree of 
node can help us understand the effect of degree changing on 
the influence spread of nodes. For this reason, we run the static 
MixGreedy algorithm [7] on the final graph and identify the top-50 
influential nodes. The results on the Facebook dataset are illustrated 
in Figure 3 where the x axis is the rank of degrees of different 
nodes (we only show the top 150). Obviously, all the selected 
influential nodes have a large degree. In particular, among the 50 
nodes, 48 nodes rank in top 100 of the whole 61,096 nodes in terms 
of degree, and the other two nodes rank 102 and 111 respectively. 
While on the NetHEPT and Flickr datasets, the top-50 influential 
nodes are selected from the top 1.79% and 0.84% nodes in degree, 
respectively. This demonstrates that the top -K influential nodes 
are mainly selected from those with large degrees. However, it 
is worthy of note that the top -K influential nodes in influence 
maximization are usually not the top-if nodes ranking in degree, 
since the influence spread of different nodes may overlap with each 
other. 

4. Inclnf Design 

In this section, we present the detailed design of Inclnf, an incre¬ 
mental approach to solve the influence maximization problem on 
dynamic social networks. The main idea of Inclnf is to take full use 
of the valuable information that is inherent in the network struc¬ 
tural evolution and previous influential nodes, so as to substantially 
narrow the search space of influential nodes. In this way Inclnf 
can significantly reduce the computation complexity and improve 
the efficiency. Figure 4 briefly illustrates the general idea of Inclnf 
in dynamic social networks. The top-A' influential nodes S t+1 of 
G t+1 at time t +1 is incrementally identified based on the previous 
influential nodes S t at time t and the structural change AG t from 
G* to G t+1 . In particular, we design an efficient method to quan¬ 
titatively analyze the impact of different structural changes on the 




















































































































































Table 1: Details of six types of basic operation 


Operation 

Description 

Impact on influence spread 

addNode(u) 

add a new node u into the current network 

the influence spread of u is set to 1 

removeNode(u) 

delete an existing node u from the network 

the influence spread of u is set to 0 

addEdge(u, v, w) 

introduce a new edge ( u, v) with p ( u , v) = w 

the influence spread of all the nodes that can reach 
u may be increased 

removeEdge(u, v) 

remove an existing edge (m, v) from the network 

the influence spread of all the nodes that can reach 
u may be decreased 

addWeight(u, v, Aw) 

increase p(u, v) by Aw 

the influence spread of all the nodes that can reach 
u may be increased 

decWeight(u,v, Aw) 

reduce p(u, v) by Aw 

the influence spread of all the nodes that can reach 
u may be decreased 
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Figure 4: Inclnf Design. 


influence spread of nodes by adopting the idea of localization (Sec¬ 
tion 4.2), and propose a pruning strategy to reduce the number of 
potential influential nodes (Section 4.3). We first describe six types 
of basic operation of topology evolution in dynamic networks in 
Section 4.1. 

4.1 Basic operations of Topology Evolution 

The evolution of social network, when reflected into its underlying 
graph, can be summarized into six categories, which are inserting 
or removing a node, introducing or deleting an edge, and increasing 
or decreasing the influence probability of an edge. We denote 
the six types of topology change as addNode, removeNode, 
addEdge, removeEdge , addWeight, decWeight. The detailed 
descriptions and their effects on influence spread are shown in 
Table 1. 

It should be noted that only after the addNode operation can 
node u establish links ( addEdge ) or sever links ( removeEdge ) 
with other nodes, and node u can only be removed when all its 
associated edges are deleted. Moreover, the weight operation can be 
equivalently decomposed into two edge operations. For example, 
addWeight(u,v, Aw) can be divided into removeEdge(u,v) 
and addEdge(u,v,w + Aw), supposing the previous weight of 
edge (u, v) is w. 

4.2 Influence Spread Changes 

As discussed above, whenever an edge (u, v) is introduced into or 
removed from the social network, the influence spread of all the 
nodes that can reach node u may be changed. However, as a matter 
of fact, the real-world social networks exhibit small-world network 
characteristics and the connections between nodes are highly com¬ 
plicated. As a result, even one small change in topology, such as 
an edge addition or removal, may affect the influence spread of a 
large number of nodes, thus introducing massive recalculations. In 
order to reduce the amount of computation, we design an approach 
to efficiently calculate the changes on the influence spread of nodes 


which adopts the localization idea [8] and tries to restrict the influ¬ 
ence spread to the local regions of nodes. 

The main idea of localization is to use the local region of 
each node to approximate its overall influence spread. In particu¬ 
lar, we use the maximum influence path to approximate the influ¬ 
ence spread from node u to v. Here the maximum influence path 
MIP(u, v, G) from node u to v in graph G is defined as the path 
with the maximum influence probability among all the paths from 
node u to v, and can be formally described as follows: 

MIP(u, v, G) = arg max {prob(p)} (1) 

peP(u,u,G) 

where prob(p) denotes the propagation probability of path p and 
P(u, v, G) denotes all the paths from node u to v in graph G. For 
a given pathp = {m,U 2 , ■ ■ ■ , u m }, the propagation probability of 
path p is defined as follows: 

m—1 

prob(p) = ]^[ p(m,Ui+i) (2) 

Moreover, an influence threshold 6 is set to tradeoff between accu¬ 
racy and efficiency. During the propagation process, we only con¬ 
sider paths whose influence probability are larger than 9 while ig¬ 
noring those with probability smaller than 8. By doing this, the 
influence is effectively restricted to the local region of each node. 

Similarly, in our proposal we localize the impact of topol¬ 
ogy changes on influence spread into local regions, and thus re¬ 
duce the amount of computation. Among six types of topology 
change, addNode (or removeNode) is the most straightforward 
since it simply sets the influence spread of the node to 1 (or 0); 
addWeight, decW eight as well as removeEdge are method¬ 
ologically similar to addEdge. Consequently, in the following 
we take addEdge as an example to show which nodes’ influence 
spread need to be updated and how to determine those changes 
when a new edge is added into the graph. 

Consider the case when a new edge e = (m, v , w ) is introduced 
between two existing node u and v. We denote the graph before 
and after such a topology change as G* and G t , and the current 
seed set is S. The detailed algorithm is described in algorithm 2. 
According to the principle of localization [8], if the propagation 
probability w is smaller than the specified threshold 8, or not bigger 
than the probability of M/P(m, i>, G*), edge e can be simply 
neglected and there is no need to update any node’s influence spread 
(lines 1-3). Otherwise, the newly-added edge e would become 
the MIP(M,M,G t ). As a result, each node i whose maximum 
influence path to u has a influence probability larger than 8 is likely 
to experience a rise in terms of influence spread (line 4) because 
node i may influence more nodes through the new edge e. So, we 
then check the probability of the maximum influence path from i to 














Algorithm 2 Edge addition 

Input: a new edge e = (u , v, w), graph G 4 . 

Output: The influence spread changes of nodes in G 4 . 

1: if w < 9 or w < prob(MIP(u, v, G 4 )) then 
2: return; 

3: end if 

4: for each node i with prob(MIP(i, u , G 4 )) > 9 do 
5: for each node j with prob(MIP(v,j , G 4 )) > 9 do 

6: if prob(MIP(i, j, G 4 )) < 9 and 

prob{MIP{i,j,G t )) > 9 then 
7: deltalnf[i]+ = prob(MIP(i,j,G t )) x (1 — 

prob(j, S)) 

8: end if 

9: if prob(MIP(i,j,G t )) > 9 and 

prob(MIP(i, j, G 4 )) > 9 then 
10: deltalnf[i\+ = (prob(MIP(i,j,G t )) — 

prob(MIP(i,j, G 4 ))) x (1 -prob(j,S)) 

11: end if 

12: end for 

13: end for 


i ./ 

u and its successors in G and G . Based on the two probabilities, 
we divide the problem into two small cases: 

The first case is when the probability of maximum influence 
path from i to j in G 4 is smaller than 9 while that in G 4 is larger 
than 9 (lines 5-6). Here j denotes the node whose probability of 
MIP(v,j, G 4 ) is larger than 9. In such a case, node i build a new 
path to j through the new edge e which increases the influence 
spread of i by prob(MIP(i, j, G 4 )) x (1 — prob(j, S )) (line 7). 
Here prob(j, S) is the probability of that node j is influenced by 
the current seed set S, which is defined as follows: 


prob(j, S) 


i, if j e s 

1 -YLne-nU) 1 ~prob{w,S) -p{w,j), if j <£ S 


Here n(j ) denotes the in-neighbour set of j. 

The second case is when the probability of maximum in¬ 
fluence path from i to j is larger than 9 in both G 4 and G 4 
(lines 9-11). In this case, the influence increase of node i is 
(prob(MIP^jjjG 1 ')) —prob(MIP(i,j,G t )))x(l—prob(j, S)). 

We treat the network dynamics from G 4 to G i+1 as a finite 
change stream ci, C 2 , • • • , d, ■ ■ ■ where each change d is one of 
the six topology changes we described above. When all the changes 
in the change stream are processed, we can obtain the influence 
spread change for all the nodes. 


4.3 Potential Top-A" Influential Users Identification 

Inspired by the observations of Section 3, we design a pruning strat¬ 
egy to reduce the search space of potential influential nodes in this 
subsection. It is assumed that we only know who are the top-A' in¬ 
fluential nodes in graph G 4 , but their detailed influence spreads are 
beyond our knowledge. The reason are mainly twofold. First, sev¬ 
eral influence maximization algorithms, such as DegreeDiscount 
[7] and SA [17], do not calculate the influence spread information 
to identify influential users so that such information are unavail¬ 
able. Second, even though these information are ready, storing them 
will cost as much as 0(nK) memory space where n is the num¬ 
ber of node in G 4 . Since real-world social networks are typically of 
large scale, this will introduce serious storage overhead and directly 
affect the scalability. 

From the preferential attachment rule, we know that the influ¬ 
ence spread changes of those high-degree nodes should be much 


greater than the ordinary nodes. Moreover, according to the power- 
law distribution, such high-degree nodes only account for a small 
part of the whole nodes. Consequently we can pick out nodes only 
experiencing major increases or with high degrees because these 
nodes are of great potential to become the top-A' influential nodes 
in G 4+1 . Then we only calculate the actual influence spread for 
these selected nodes while ignoring the others. In this way, a large 
percent of nodes are pruned and the search space is largely narrow. 
It should be noted that a smart pruning strategy is of key impor¬ 
tance since a poor selection might either affect the efficiency or 
reduce the accuracy in terms of influence spread. We describe the 
details of our pruning strategy as follows: 

1. In the ith iteration, if the influence spread of the previous 
influential node S* increases in G t+1 , the chosen nodes are 
those with a larger influence spread change than deltaInf[Sj]\ 

In most cases, the influential nodes will attract a great num¬ 
ber of new nodes and establish new links. Thus, their influ¬ 
ence spreads will increase drastically. In such a case, the nodes 
whose influence spread changes are smaller than the influential 
nodes are completely impossible to become the most influen¬ 
tial node in G i+1 . Therefore, when the influence spread of the 
previous influential nodes increase, we only select those whose 
influence spread changes are larger than the influential nodes in 
G 4 . According to the preferential attachment rule, such a prun¬ 
ing method can greatly narrow the search space and reduce the 
amount of computation. 

2. In the ith iteration, if the influence spread of the previous in¬ 
fluential node Si decreases in G t+1 , in addition to qualifica¬ 
tion 1, the nodes are further selected to hold a sufficiently large 
degree or experience a sufficiently great increase. In order to 
formally define “large degree” and “great increase”, here we 
set an threshold p to tradeoff between running time and in¬ 
fluence spread. Here the nodes with sufficiently large degrees 
(or great increase) are defined as the set of node Vj whose de¬ 
gree (or degree increase ratio) is among the top p percent of all 
nodes in G 4+1 . The degree increase ration of Vj is defined as 
degree^ 1 /degree^ where degreej denotes the degree of node 
Vj in graph G 4 . Experimental results in Section 5 will demon¬ 
strate that 5% may stand as a good tradeoff between running 
time and influence spread. 

It should be noted that although the case the influence spread 
of a previous influential node decreases during the evolution 
rarely happens, we consider it here for completeness. In this 
case, except for qualification 1, we further select nodes because 
the number of nodes satisfying qualification 1 is relatively large 
which lead to mass computation. While in reality, a node with 
small degree has only very low probability to become an in¬ 
fluential node. In order to select only the most potential nodes, 
we refine the requirement and additionally select the nodes with 
large degree and large increase. Consequently, the search space 
is strictly circumscribed and the computational complexity is 
greatly reduced. 

After the potential nodes are selected, we calculate the actual 
influence spread of these nodes in G t+1 and select the one with the 
maximum influence spread in each iteration. Algorithm 3 outlines 
the design of our proposed algorithm Inclnf. Inclnf iterates for 
K round (line 2) and in each round select one node providing 
the maximum marginal influence spread. Lines 3-5 calculate 
the influence spread change of each node caused by the topology 
evolution. Nodes with great potential to become top-A) influential 
are selected (line 6) and their influence spread are computed in 
G t+1 (lines 7 - 9). And then the node providing the maximal 







Table 2: Summary information of the real-world social networks 


Datasets 

Nodes 

Edges 

Initial Number 

Final Number 

Growth 

Initial Number 

Final Number 

Growth 

Facebook 

12,364 

61,096 

394% 

73,912 

905,665 

1125% 

NetHEPT 

5,802 

29,555 

409% 

57,765 

352,807 

511% 

Flickr 

1,620,392 

2,570,535 

58.6% 

17,034,807 

33,140,018 

94.5% 


Algorithm 3 Inclnf 

Input: G 4 , S 4 , and G t+1 . 

Output: the top-A' influential nodes S t+1 in G t+1 . 

1: Initialize S t+1 = 0; 

2: for i = 1 to A do 

3: for each topology change Cj from G t to G t+1 do 

4: calculate the influence spread change deltalnf[-]; 

5: end for 

6: select a set of potential nodes as pn according to pruning 

strategy; 

7: for each node vi G pn do 

8: calculate the marginal influence spread a s t+i ( Vj)\ 

9: end for 

10: select Vmax = argmax„. 6f ,„ (a s t+i{vj))' 

11: S=SUVmax; 

12: end for 


marginal gain will be selected and added to the set S' 4 "*" 1 (lines 10 - 

11 ). 

5. Experiments 

In this section, we present the experimental results of our algorithm 
on identifying top-A influential nodes in dynamic social networks. 
We examine two metrics, running time and influence spread, for 
evaluating the effectiveness as well as the execution efficiency 
of different algorithms. The experimental results are detailed in 
Section 5.2, 5.3 and 5.4. 

5.1 Experimental Setup 

We choose three real-world social networks including Facebook so¬ 
cial network, NetHEPT citation network, and Flickr social network. 
Table 2 summarizes the statistical information of the datasets. 

• Facebook. This dataset is the friendship relationship network 
among New Orleans regional network on Facebook, spanning 
from Sep 2006 to Jan 2009 [31], There are more than 60A 
users connected together by as much as 1.5M links in the social 
network. 41.4% of these edges contain no time information and 
are thus discarded. In our experiments, the nodes and links from 
Sep. 2006 to Apr. 2007 are used as the first snapshot and then 
network snapshots are recorded every 3 months. 

• NetHEPT. This is an academic citation network [2] extracted 
from “High Energy Physics-Theory” section of the arXiv over 
the period from 1992 to 2003, and covers the citations within 
a dataset of 28A papers with 352A' edges. In our experiments, 
the citation links of the first three year (i.e. from 1992 to 1994) 
are considered as the basic graph and the network snapshots are 
recorded once a year. 

• Flickr. This dataset [27] contains the user-to-user links crawled 
from the Flickr social network daily over the period from Nov. 
2, 2006 to Dec. 3, 2006 and again from Feb. 3, 2007 and May 
18, 2007, representing a total of 104 days of growth. There are 
totally 2.5 M Flickr users and 33 M links. During this period 


of observation, over 9.7 million new links are formed and over 
950,000 new users joined the network. In our experiments, we 
use the network before Nov. 2, 2006 as the basic graph and 
another five snapshots are recorded on Dec. 3, Feb. 3, Mar. 3, 
Apr. 3, and May 18. 

We compare our algorithm with four static algorithms: Mix- 
Greedy, ESMCE, MIA and Random. MixGreedy is an improved 
greedy algorithm on the IC model proposed by Chen et al. in [7]. 
ESMCE is a power-law exponent supervised estimation approach 
designed by Liu et al. in [25]. MIA is a heuristic that uses local 
arborescence structures of each node to approximate the influence 
propagation [8]. Random is a basic heuristic that randomly selects 
A nodes from the whole datasets. 

The propagation probability of the IC model is selected ran¬ 
domly from 0.1, 0.01, and 0.001 for each network snapshot, and 
we run simulations on networks 10000 times and take the average 
of the influence spread. 

5.2 Efficiency Study 

In this subsection, the efficiency of our proposed algorithm is 
studied and compared with corresponding static algorithms, Mix- 
Greedy and MIA, through experiments on the Facebook, NetHEPT 
and Flickr datasets. The experiments are conducted on a PC with 
Intel Core i7 920 CPU @2.67 GHz and 6 GB RAM. The running 
time of four algorithms are measured by selecting 50 seeds from 
the whole dataset. 

The time costs of different algorithms are illustrated in Fig¬ 
ure 5 where we record the total time cost for each snapshot of 
the three datasets. Since incremental and static algorithms have 
the same time cost in the initial snapshot, thus they are omitted 
in the figure. The experimental results show that the time costs 
of our algorithm on each snapshot are obviously less than those 
of static algorithms. Obviously, MixGreedy takes the longest time 
among four kinds of influence maximization algorithms. It takes 
MixGreedy more than as much as 6 hours to identify the top 50 in¬ 
fluential nodes on the final NetHEPT dataset, while the time is even 
longer on the larger dataset Facebook. Moreover, MixGreedy is not 
feasible to run on the largest dataset Flickr due to the unbearably 
long running time. ESMCE, benefiting from its sampling estima¬ 
tion method, runs much faster than MixGreedy, but it still takes as 
much as 3511 seconds on average to run on the five snapshots of 
Flickr. Compared with two greedy algorithms, the heuristic MIA 
performs much better. It only takes MIA 23.8 seconds to run on 
the final Facebook graph. When running on the Flickr dataset with 
as much as 2.5 M nodes and 33A/ edges, however, its speedup is 
far from satisfactory, since it still needs more than 45 minutes to 
finish. While our proposed algorithm, Inclnf, outperforms all the 
static algorithms in terms of efficiency. In particular, Inclnf is al¬ 
most four orders of magnitude faster than the MixGreedy algorithm 
on the Facebook dataset. While compared with the MIA heuris¬ 
tic, the speedup of Inclnf is 8.41 x and 6.94x on the Facebook 
and NetHEPT datasets, respectively; What's more, when applied 
on the largest dataset Flickr, Inclnf can achieve as much as 20.65 x 
speedup on average. This is because Inclnf only computes the in¬ 
cremental influence spread changes and adaptively identifies the 
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Figure 5: The time costs of different algorithms on three real-world datasets. 
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Figure 6: The effect of pruning strategy on the Facebook dataset. 


influential nodes based on the previous influential nodes and the 
current influence spread changes. The experimental results clearly 
validate the efficiency advantage of our incremental algorithm In¬ 
clnf. We can also observe that the running time of Inclnf is not 
monotone like other algorithms as the time evolves. This is be¬ 
cause the running time of Inclnf is closely related to the topol¬ 
ogy change between two graph snapshots. An evident change in 
topology will usually lead to a relatively long running time and 
vice versa. Without doubt, Random runs the fast among all the al¬ 
gorithms. However, as we will show in Section 5.3, its accuracy 
is much worse and unacceptable when developing real-world viral 
marketing strategies. 

We also test the effect of our pruning strategy. Here we take 
the Facebook dataset as an example; the results on other datasets 
are similar and thus omitted. Different from other experiments, 
we recorded the Facebook graph from Sep. 2006 to Oct. 2007 
(14 months) as snapshot A in this experiment. After that we take 
snapshots every month as snapshot B. We use Inclnf to find the 
top-A' influential nodes in snapshot B based on ones in snapshot 
A. The result is shown in Figure 6. The x axis is the time interval 
between snapshot A and B, and the y axis the ratio of the number 
of nodes after pruning to the total number of nodes in snapshot B. 
The minimum and maximum pruning ratios are 3.90% and 5.86% 
respectively, with a mean ratio of 4.72% on all the 14 time intervals 
between snapshot A and B. This demonstrates that our pruning 
strategy can effectively limit the search space into a small percent 
of nodes. We can also see in Figure 6 that with the increase of 


time interval, the ratio, although not monotone, generally becomes 
larger. This is mainly because a longer time interval means a larger 
amount of topology changes, and basically more nodes will be 
potential to become influential nodes. 

5.3 Effectiveness Study 

In this subsection, we study the influence spread of the top-A' influ¬ 
ential nodes selected by our algorithm as well as other static algo¬ 
rithms. The influence spreads of different algorithms are measured 
as the number of nodes that are influenced by the top-50 influen¬ 
tial nodes selected. Obviously, the higher the influence spread, the 
better the effectiveness. We have not test the performance of Mix- 
Greedy on the Flickr dataset as the running time is excessively long. 

Figure 7 shows the experimental results. MixGreedy out¬ 
performs all the other algorithms in terms of influence spread. 
However, the efficiency issue limits its application to large-scale 
dataset such as Flickr. The performance of ESMCE, MIA and In¬ 
clnf almost match MixGreedy on the Facebook dataset, while on 
NetHEPT, the gaps become larger but remain acceptable (only 
3.4%, 4.7% and 5.1% lower than MixGreedy on average). When 
applied to the Flickr dataset, ESMCE performs the best since 
ESMCE strictly control the error threshold by iterative sampling. 
Compared with MIA, Inclnf shows very close performance and is 
only 2.87% lower on average of all five snapshots, which demon¬ 
strates the effectiveness of our proposal. Random, as the baseline 
heuristic, clearly performs the worst on all the graphs. Actually, the 
influence spread of Random is only 15.6%, 12.1% and 10.9% of 
that of Inclnf on Facebook, NetHEPT and Flickr, respectively. 

We shall note that the reason Inclnf has slightly lower influ¬ 
ence spread is mainly twofold. First, Inclnf restrict the influence 
into local regions to speed up the computation of influence spread 
changes, which will affect the effectiveness. Second, a pruning 
strategy is designed to narrow down the search space based on 
the influence spread changes and previous top-A' information. De¬ 
spite slight loss in effectiveness, as aforementioned, the disparity 
is small and acceptable. More importantly, Inclnf gains remarkable 
improvement in efficiency. 

5.4 Tuning of Parameter 8 and r) 

First, we study how effectively the localization parameter 8 of 
Inclnf represents a tradeoff between efficiency and effectiveness. 
We run Inclnf with different values of 8 on the final Facebook 
and NetHEPT graphs. The running time and influence spread are 
measured based on seed size K = 50. 
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Figure 7: The influence spread of different algorithms on three datasets. 
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Figure 8 : The effect of tuning of 9 on running time and influence spread. 


The experimental results are shown in Figure 8 . Note that the 
x axis represents the reciprocal of 9. We observe that 6 acts as a 
tradeoff between efficiency and effectiveness: with the decrease of 
9, Inclnf and MIA achieve better influence spread. However, this 
is gain at the cost of longer running time, i.e., poor efficiency. For 
example, when we reduce 9 from 1/200 to 1/500 on the Facebook 
dataset, the influence spread of Inclnf increases by 15.4% while the 
running time is 1.12x longer. Moreover, we can observe that the 
influence spread of Inclnf almost match that of MIA in all values of 
9. For example, Inclnf is only 1.87% lower than MIA in influence 
spread when 9 is set to 1/200 in the NetHEPT dataset. But Inclnf 
shows overwhelming advantages in terms of running time. When 9 
is set to 1/500 in Facebook, Inclnf needs only 5 second to identify 
the top-50 influential nodes while it takes MIA more than 150 
second to finish the same work. More importantly with the decrease 
of 9, the influence spread increases sharply at the beginning but the 
increase is no longer that significant after 9 is lowered to a certain 
level. On the contrary, the running time is almost linear to 1/9. This 
suggests that the knee point of the influence spread curve can serve 
as a good tuning point of 9 where we could obtain the best gain 
from both influence spread and running time. 


Then, we will evaluate the sensitivity of pruning threshold 77 
in terms of influence spread and running time. The results are 
illustrated in Figure 9. From figure 9 we can see that, with the 
increase of 77 /, the running time increase gently at the beginning 
and then turns into a sharp boost. For example, when we increase 
77 from 1% to 5%, the running time of Inclnf on the Facebook 
dataset only increase from 2.13s to 8.47s, while it dramatically 
increases from 8.47s to 87.35 when 77 is tuned from 5% to 10%. 
This phenomenon is closely related to the power-law distribution 
of degree in social network; when 77 set large, a relatively large 
number of potential nodes would be selected. 

In terms of influence spread, as the increase of 77, more nodes 
are selected as potential nodes which will guarantee better influence 
spread. Different from the running time, the influence spread grows 
rather rapid at the beginning, and then gradually slows down. The 
influence spread on the Facebook Dataset is 7854 when 77 is set to 
1%, and rapidly grow to 13967 when the maximum error threshold 
is 5%. After that, the growth trend slows down and the influence 
spread is about 15091 as 77 increases to 10%. This reason to explain 
such phenomenon is that the top -K influential nodes are mainly se¬ 
lected from high degree nodes. Therefore, when 77 becomes larger, 























Figure 9: The effect of tuning of 77 on running time and influence 
spread. 

although more nodes would be selected, their contribution to influ¬ 
ence spread are relatively small, thus the growth trend slows down. 
Based on the above observation, here we suggest that 5% may stand 
as a good tradeoff between running time and influence spread. 

5.5 Discussions 

Experimental results demonstrate that our proposed Inclnf algo¬ 
rithm significantly reduces the execution time of state-of-the-art 
static influence maximization algorithm while maintaining satis¬ 
fying accuracy in terms of influence spread. Although Inclnf per¬ 
forms better, it has a few limitations for further improvement. 

First, Inclnf directly depends on previous information of top- 
K influential nodes for effective pruning, while sometimes such 
information are incomplete, or even unavailable. We plan to study 
this problem later. Second, Inclnf is designed for the IC model 
which may somehow limit its application. But we believe our idea 
of incremental computation for influence maximization could be 
properly extended to other influence diffusion models. 

6. Related Work 

Influence maximization on static networks has attracted a lot of at¬ 
tentions. The hill-climbing greedy algorithms proposed by Chen et 
al. suffers from low efficiency, and many efficient algorithms have 
been proposed recently to address this problem. Leskovec et al. 
[24] exploit the submodularity of influence spread function and de¬ 
velop an optimized greedy algorithm, CELF, which is much faster 
than basic greedy algorithm. Chen et al. [7] propose MixGreedy 
which computes the influence spread for each seed set in one sin¬ 
gle simulation and incorporates the CELF optimization. MIA [ 8 ] 
uses local arborescence structures of each node to approximate the 
influence spread, thereby gaining efficiency by restricting compu¬ 
tations and updates only on the local regions. However, MIA only 
considers static networks while in this paper we specifically design 
an incremental algorithm for evolving social networks. Recently, 
Wang et al. [32] propose a Community Greedy Algorithm (CGA) 
that took community property into account. Goyal et al. propose 
CELF++[15] further exploits the property of submodularity of the 
spread function to avoid unnecessary re-computations of marginal 
gains, and considerably improves the efficiency of CELF algorithm. 
IRIE [18] is also a heuristic proposed by Jung et al. that incorpo¬ 
rates influence ranking algorithm with influence estimation method 
to achieve scalability. Chen et al. [9] propose a BatchGreedy al¬ 
gorithm for active learning and demonstrated through experiments 


that BatchGreedy could considerably improved the effectiveness of 
previous greedy algorithms. Liu et al. [26] design a new framework 
to accelerate the influence maximization by leveraging the parallel 
processing capability of GPU. In [22], Lee et al. propose GIS with 
a similar idea of influence localization, but they didn’t consider the 
dynamic feature of online social networks. Cheng et al. [10] present 
IMRank to solve the IM problem via finding a self-consistent rank¬ 
ing. 

The influence maximization problem on dynamic social net¬ 
works still remains largely unexplored to date. Habiba et al. [16] 
propose a dynamic social network model which is different from 
ours. In their proposal, the network keeps evolving during the pro¬ 
cess of influence propagation, and their goal is to find the top -K 
influential nodes over such a dynamic network. When compared 
to [16], our work is based on snapshot graph model and our goal 
is to incrementally identify top-iT influential nodes based on the 
topology changes of two adjacent snapshots. Chen et al. [ 6 ] ex¬ 
tend the IC model to incorporate the time delay aspect of influence 
diffusion among individuals in social networks, and consider time- 
critical influence maximization, in which one wants to maximize 
influence spread within a given deadline. While in [13], the authors 
consider a continuous time formulation of the influence maximiza¬ 
tion problem in which information or influence can spread at dif¬ 
ferent rates across different edges. Charu Aggarwal et al. [1] try 
to discover influential nodes in dynamic social networks and they 
design a stochastic approach to determine the information flow au¬ 
thorities with the use of a globally forward approach and a locally 
backward approach. Their influence model and target are different 
from ours. Zhuang et al. [33] argue that the evolution of online so¬ 
cial network could not be fully observed and focus on the problem 
of designing a proper probing strategy so that the actual influence 
diffusion process can be best uncovered with the probing nodes. 

7. Conclusion and Future Work 

In this paper, we consider the influence maximization problem in 
evolving social networks, and propose an incremental algorithm, 
Inclnf, to efficiently identify top-iT influential nodes in dynamic 
social networks. Taking advantage of the structural evolution of 
networks and previous information on individual nodes, Inclnf sub¬ 
stantially reduces the search space and adaptively selects influential 
nodes in an incremental way. Extensive experiments demonstrate 
that Inclnf significantly reduces the execution time of state-of-the- 
art static influence maximization algorithm while maintaining sat¬ 
isfying accuracy in terms of influence spread. 

There are several future directions for this research. First, Inc¬ 
lnf has large potential to fit into modern parallel computing frame¬ 
work. This is because Inclnf restricts the computation of influence 
spread changes into local regions, which could ease the partition 
of social graph for parallel computation. Moreover, the proposed 
pruning strategy could be effectively performed in parallel. Second, 
our current Inclnf algorithm is derived from the basic IC model. 
We believe the conception of incremental computation for influ¬ 
ence maximization could be properly extended to other influence 
diffusion model, such as another classic LT model. Third, although 
there have been a few research [5, 29] about how to measure the 
propagation probability, however this problem is not yet well ad¬ 
dressed especially for large-scale dynamic social networks. 
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